Named Entity Recognition through Redundancy Driven Classifiers
نویسندگان
چکیده
We present Typhoon, a classifier combination system for Named Entity Recognition (NER), in which two different classifiers are combined to exploit Data Redundancy and Patterns extracted from a large text corpus. Data Redundancy is attained when the same entity occurs in different places in documents, whereas Patterns are 2-grams, 3-grams, 4-grams and 5-grams preceding, and following entities in documents. The system consists of two classifiers in cascade, but it is possible to use a single classifier making the system faster (100 times faster, with a speed rate of about 20,000 tokens/sec); whereas the second classifier in the cascade can be used when more accuracy is needed. Moreover the system can use additional features such as that given by using a Text Classifier able to recognize the category to which the story belongs. The system performed the best on the task of Italian NER at EVALITA 2009, with an F1 of 0.82.
منابع مشابه
Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملBiomedical Named Entity Recognition Based on Classifiers Ensemble
In this paper, we present classifiers ensemble approaches for biomedical named entity recognition. Generalized Winnow, Conditional Random Fields, Support Vector Machine, and Maximum Entropy are combined through three different strategies. We demonstrate the effectiveness of classifiers ensemble strategies and compare its performances with standalone classifier systems. In the experiments on the...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملA comprehensive experimental comparison of the aggregation techniques for face recognition
In face recognition, one of the most important problems to tackle is a large amount of data and the redundancy of information contained in facial images. There are numerous approaches attempting to reduce this redundancy. One of them is information aggregation based on the results of classifiers built on selected facial areas being the most salient regions from the point of view of classificati...
متن کامل